Flame (v2.0): advanced integration and interpretation of functional enrichment results from multiple sources

Abstract   Summary: Functional enrichment is the process of identifying implicated functional terms from a given input list of genes or proteins. In this article, we present Flame (v2.0), a web tool which offers a combinatorial approach through merging and visualizing results from widely used functional enrichment applications while also allowing various flexible input options. In this version, Flame utilizes the aGOtool, g: Profiler, WebGestalt, and Enrichr pipelines and presents their outputs separately or in combination following a visual analytics approach. For intuitive representations and easier interpretation, it uses interactive plots such as parameterizable networks, heatmaps, barcharts, and scatter plots. Users can also: (i) handle multiple protein/gene lists and analyse union and intersection sets simultaneously through interactive UpSet plots, (ii) automatically extract genes and proteins from free text through text-mining and Named Entity Recognition (NER) techniques, (iii) upload single nucleotide polymorphisms (SNPs) and extract their relative genes, or (iv) analyse multiple lists of differentially expressed proteins/genes after selecting them interactively from a parameterizable volcano plot. Compared to the previous version of 197 supported organisms, Flame (v2.0) currently allows enrichment for 14 436 organisms. Availability and implementation Web Application: http://flame.pavlopouloslab.info. Code: https://github.com/PavlopoulosLab/Flame. Docker: https://hub.docker.com/r/pavlopouloslab/flame.

While this variety of applications allows focusing on different aspects such as enriching for (i) biological pathways [e.g. KEGG (Kanehisa and Goto 2000), WikiPathways (Slenter et al. 2018), Reactome (Fabregat et al. 2018)], (ii) Gene Ontology [GO (Aleksander et al. 2023)] terms such as biological processes, molecular functions and cellular components, (iii) diseases [e.g. OMIM (Amberger et al. 2015), DisGeNet (Piñero et al. 2020)], (iv) protein complexes [e.g. CORUM (Giurgiu et al. 2019)], (v) protein domains [e.g. Pfam (Finn et al. 2016)], (vi) phenotypes [e.g. HPO (Robinson et al. 2008)], or (vii) regulatory motifs [e.g. TRANSFAC (Matys et al. 2003), miRTarBase (Huang et al. 2020)], an obvious selection of the right tool to cover certain needs is most of the times not straightforward. Moreover, while most of these tools come with a simple interface, little emphasis has been given on the visualization and combination of results, id conversion, parameter selection, result prioritization, record association, support of multiple lists and versatile input options. Therefore, usage and interpretation of results in a more streamlined and combinatorial aspect still remain difficult tasks for the average user.
In this article, we present Flame (v2.0), an advanced service for addressing the aforementioned issues. Flame focuses on usage simplicity, offering many degrees of freedom regarding importing and handling multiple lists, running advanced state-of-the-art functional enrichment analysis pipelines as well as combining and prioritizing results through advanced interactive visualizations. Flame is able to report results as lists but also associate them at a network level (Koutrouli et al. 2020), while it offers major integration of resources (Baltoumas et al. 2021a) such as aGOtool, g: Profiler, WebGestalt, Enrichr, and STRING (Szklarczyk et al. 2023); thus referring to their broad audience while taking full advantage of their strengths.

Input options
Compared to its predecessor (Thanati et al. 2021), Flame (v2.0) comes with a variety of newly introduced input options (Fig. 1). These are mainly: (i) support of simple gene lists, (ii) support of variants and polymorphisms, (iii) support of textmining and Named Entity Recognition (NER) techniques to process free text and mine it for genes and proteins, and (iv) support of gene expression data derived from experiments.
Gene lists: Similarly to Flame v1.0, users can directly paste or upload multiple lists by using a plethora of gene identifiers or chromosomal intervals. Gene and protein names can be imported in the Ensembl (Cunningham et al. 2022), Entrez (Sayers et al. 2022), UniProt (UniProt Consortium 2021), RefSeq (Li et al. 2021), EMBL (Arita et al. 2021), ChEMBL (Mendez et al. 2019), and WikiGene (Maier et al. 2005) namespaces, or be later converted into another namespace during the functional enrichment analysis procedure.
Variants and Polymorphisms: In addition to directly uploading gene lists, Flame is capable of parsing and annotating lists of single nucleotide polymorphisms (SNPs), through the "SNP" input option. The input in this case is a list (whitespace or comma-delimited) of SNP rs-codes in the dbSNP (Sherry et al. 2001) format. The ids are mapped to their corresponding genes and associated Ensembl identifiers through the g: SNPense functionality of g: Profiler and are then searched against dbSNP to retrieve additional metadata, including the chromosome number, strand, genomic coordinates, and the variant's reported effects. The annotated SNPs can be downloaded, while their associated genes can be added to the list of Flame inputs, using either their Entrez gene name or their Ensembl ids. Since dbSNP has dropped support for nonhuman variants in recent versions, this functionality is currently available only for Homo sapiens (human).
Text-Mining and NER: In this version, users can provide free text and utilize the EXTRACT (Pafilis et al. 2016) tagger to perform Named Entity Recognition (NER) to mine it for genes and proteins for an organism of preference. Upon completion, Flame will report the identified genes/proteins in an interactive and easy-to-filter table as well as an annotated text with the identified terms highlighted. In mouse-hover, a popup window will appear, providing information about the entity as well as direct links to the STRING and Ensembl databases. The annotated terms can be added to the list of Flame inputs, using their associated Ensembl ids. A free text which can be further mined for genes and proteins using NER for an organism of preference. The identified genes and proteins can be then used for functional enrichment analysis. (D) Input of gene expression data (gene name, P-value, logFC) to generate a Volcano plot. Users can manually apply thresholds on the P-value and the FC axes to define the differentially over-and under-expressed genes of significance and interactively select them from the plot for further analysis This functionality is supported for all currently available organisms in Flame (14 000 taxa). To this end, it is worth mentioning that the EXTRACT tagger is already being used by a plethora of established text-mining based applications (Pletscher-Frankild et al. 2015, Baltoumas et al. 2021b, Zafeiropoulos et al. 2022, Szklarczyk et al. 2023. Gene Expression data: Defining the significant differentially over-and under-expressed genes is most times an empirical task performed by experimentalists, who usually apply thresholds within an accepted range for parameters such as the fold change (e.g. absolute FC > 2) and the statistical significance (e.g. P-value < 0.05). Therefore, slightly tuning the parameters may result in a more complete biological story. For this purpose, Flame offers the ability to upload expression data in tab delimited or comma separated format (gene name-log fold change-statistical significance) and interactively select sets of genes on a generated Volcano plot. Users can adjust both the P-value and FC thresholds and redraw the plot, highlighting the overexpressed genes in red and the under-expressed genes in blue. Following, users may apply a selection lasso or rectangle to mark gene sets of interest directly on the Volcano plot, and append them to the Flame lists which can then be further processed and enriched.

Integration and functional enrichment for multiple organisms from various sources
Due to the lack of standards in the field (Wijesooriya et al. 2022), functional enrichment applications often differ on the databases they support, the way of processing data, the organisms they cover, their ranking and scoring schemes, their reporting and visualizations as well as the input options they offer. While such differences may hinder usage, a combination of such tools as a means of complementing each other's weaknesses is always an option. However, familiarity with each tool of choice is necessary but often a time consuming process. In Flame (v2.0), we overcome this issue by bringing together four established applications, namely aGOtool, gProfiler, WebGestalt, and enrichR under the same interface and making them available, in a consistent and unified manner. These four tools were selected based on the following criteria: (i) availability of programmatic access, either in the form of a package or through an Application Programming Interface (API), (ii) the number and diversity of supported databases for enrichment, (iii) the number of supported organisms, and (iv) the ability to handle large requests.
In detail, aGOtool (and subsequently Flame) currently supports 14 436 organisms corresponding to UniProt reference proteomes, including animals (mammals, insects, fish, etc.), plants, fungi, protozoa and bacteria. The tool offers functional enrichment for biological processes, molecular functions, and cellular components through Gene Ontology, pathway enrichment through KEGG, Reactome and WikiPathways, disease enrichment through the Disease Ontology (Schriml et al. 2019), annotation enrichment for proteins through UniProt, Pfam and InterPro (Blum et al. 2021) as well as tissue enrichment through the Brenda Tissue Ontology (Gremse et al. 2011). It also offers literature enrichment through PubMed. Regardless of the input identifiers, gene/protein names are converted to the Ensembl protein namespace (ENSP) through the STRING API before the enrichment process. The ENSP namespace returns the most accurate results for the aGOtool enrichment pipeline.
While using the g: Profiler pipeline, extra features such as enriching for tissues through the Human protein Atlas (HPA) (Karlsson et al. 2021), protein annotations through CORUM, phenotypes through the Human Phenotype Ontology (HPO) (Robinson et al. 2008), or DNA regulatory motifs through TRANSFAC and miRTarBase are made available. It is worth mentioning that g: Profiler comes with an advanced id conversion function which enables namespace and orthology conversion for 821 organisms, while it provides great flexibility on the identifiers which it accepts as input.
Through Flame, users can perform enrichment using each individual tool, or can select to combine multiple tools. The combination of tools that can be used depends on their supporting the organism chosen for the analysis. Enrichment can be performed using the entire organism's genome as the reference with any of the four tools; alternatively, users can also upload a custom reference background of their own and perform enrichment with aGOtool, g: Profiler, and WebGestalt.
A summary and a direct comparison of the supported features, cutoff types and allowed namespaces are shown in Table 1. Regarding execution times, aGOtool is the default selected enrichment tool of Flame and performs the fastest while WebGestalt's API executes the slowest and may affect Flame's performance.

Visualization options
Flame extends its functionality by offering a plethora of visualizations and network analysis options for the interpretation of enrichment results. Besides simple searchable and interactive tables, it offers several visual alternatives for presenting the reported results from each enrichment tool (Fig. 2). Flame provides four different main visualization options to represent the top enriched terms. These are: (i) networks, (ii) heatmaps, (iii) barcharts and (i) scatter plots. For each category, one can adjust the top hits interactively by setting up filters and parameters such as datasources, number of visualized top terms and scoring metric. In the case of networks and heatmaps, three different modes of association are offered, namely Functions versus Genes, Functions versus Functions, and Genes versus Genes. In the first mode, the top functions and their associated genes are presented either as a network Flame (v2.0): advanced integration and interpretation of functional enrichment results from multiple sources (nodes linked with edges), or as a heatmap (associations are shown as colored cells/blocks). In the second mode, functions are connected with other functions based on the number of their shared genes, whereas in the third mode, genes are connected based on the common functions they are involved in. In the cases of networks, bar charts and scatter plots, one can adjust the top hits by combining multiple types of resources (e.g. pathways from KEGG, Reactome and WikiPathways). In addition, while most of the network associations are shown in 2D, Flame has the option to call Arena3D web (Karatzas et al. 2021, Kokoli et al. 2023 in order to visualize heterogeneous information in a 3D view as a multi-layered graph (each biomedical entity type corresponding to a different layer).
All aforementioned plots are also available for the literature enrichment pipeline, where instead of functions, PubMed articles are returned. A Manhattan plot can also be generated for g: Profiler exclusively. Finally, as an extra analysis step, Flame utilizes the STRING API to generate protein-protein interaction networks, which can subsequently be used to also perform enrichment analysis.

Use of UpSet plots for comparing input lists and reported results from various sources
Flame (v1.0) uses UpSet plots, as a replacement for Venn diagrams, in order to show unions, intersections and distinct combinations among the input gene lists. Following the same approach, in this version, Flame (v2.0) extends this functionality to show how the four supported back-end pipelines (aGOtool, gProfiler, WebGestalt, enrichR) agree on the results they report (Fig. 3).
To this end, when more than one functional enrichment tool has been executed, an extra tab named "Combination" is generated automatically. In this tab, distinct intersections between the four resources are summarized in a searchable and easy-to-filter table as well as in an interactive UpSet plot. The table reports the aggregated enriched terms along with the tool(s) that generated the term and its respective rank (number of tools that returned it). Rank 1 means that a biomedical entity (e.g. an enriched pathway) was detected by one resource only, while rank 4 means that all four tools fetched the same result. In addition to the above, the statistical significance of the combined results is estimated by applying Fisher's combined probability test (Mosteller and Fisher 1948), represented by X 2 and combined P-value metrics for each enrichment term identified by two or more tools.
In parallel, Flame (v2.0) generates an interactive UpSet plot where users can click on a bar to generate a table which corresponds to the subset of shared terms shown in the corresponding UpSet column. In this implementation, the UpSet plot consists of two to four rows (one for each enrichment tool) while columns describe the distinct intersections among results. Both the table and the UpSet plot can be limited to a user-selected set of biomedical entries (e.g. compare reported pathways, or biological processes or both). In addition, Flame offers the option to visualize the combined results as a Functions versus Genes association network (as described in the previous section), in which the weight of the edges corresponds to the number of tools supporting the corresponding gene-function associations.

Implementation
Flame is mainly written in R (Shiny framework). The enrichment analysis with aGOtool is executed through its corresponding API (https://agotool.org/api_orig). Enrichment  library. The volcano plot input along with all other enriched term plots (heatmaps, barcharts and scatter plots) are generated with the use of the plotly R library. A Manhattan plot is also offered, specifically for the g: Profiler results, through its respective package. Finally, UpSet plots are drawn with the use of upsetjs R library. The SNP input and namespace/orthology conversions are handled through the g: Profiler package. Text-mining input is handled by the EXTRACT tagger API (https://tagger.jensen lab.org). The PPI network analysis is offered by the STRING API (https://string-db.org/api). Flame handles its GET API requests through R and its POST API requests through a dedicated node.js server.

Discussion
Flame (v2.0) expands on its previous capabilities in many aspects (Table 2); from offering users a plethora of different input options, to a significant organism-space expansion, the addition of multiple enrichment libraries/APIs and the combination of their results, as well as through enhancing its own API. We believe that the multiple input options along with STRING's organisms, make Flame a very competitive application which greatly enhances user experience. By also offering the streamlined execution of multiple functional enrichment tools and the aggregation of their results, we facilitate the knowledge extraction of analysis, and partially tackle the  1) Gene/identifier lists 1) Gene/identifier lists 2) SNP lists 3) NER and text-mining on a free text 4) Gene selection from an interactive volcano plot 5) Support of custom background lists List comparison with the use of UpsetPlots 1) Input gene lists 1) input gene lists 2) reported results from any of the four supported tools Other types of analysis 1) Protein-protein interaction networks 2) Gene orthology search 3) Database id conversion 1) Protein-protein interaction networks 2) Gene orthology search 3) Database id conversion 4) Mapping of SNPs to genes 5) Statistical testing when combining multiple P-values from various tools 6) Functional and literature enrichment for the generated STRING PPI network Visualization 1) Plots (Bar, UpSet, Scatter, Manhattan) 2) Heatmaps 3) 2D Networks with the use of VisNetwork 1) Plots (Bar, UpSet, Scatter, Manhattan) 2) Heatmaps 3) 2D Networks with the use of VisNetwork 4) Gene-function network visualization for entities from multiple sources 5) 3D multilayer graphs with Arena3D web API requests 1) GET 1) GET 2) POST question "which tool should I use/trust?". Given its accessibility and ease of use, we hope Flame will quickly become the go-to tool for advanced functional enrichment analysis and visualization.